Azure AI Foundry: The AI App and Agent Factory

Session Date: May 19, 2025
Duration: 1 hour
Venue: Build 2025 Conference - BRK155
Speakers: Yina Arenas (VP Azure AI Foundry, Microsoft), Scott Hanselman (Vice President Developer Community, Microsoft)
Link: [Microsoft Build 2025 Session BRK155]

Azure AI Foundry Platform

Executive Summary

This flagship Azure AI Foundry session showcases how Microsoft’s comprehensive AI platform transforms real-world workflows through an ambitious live demonstration: automating Scott Hanselman’s 20-year podcast production process. Yina Arenas and Scott Hanselman demonstrate the platform’s three core pillars�models, agents, and observability�through extensive live coding, revealing how developers can build production-ready AI applications with integrated signals loops that connect model choice, knowledge retrieval, fine-tuning, orchestration, and memory.


Key Topics Covered

?? 1. The Real-World AI Challenge: Hanselminutes Podcast Factory

The 20-Year Corpus Challenge

Scott’s Opening Context: > “I’ve got this podcast that I do on the side… I just published episode 997… There’s over 500 hours of material. I was actually almost delisted by a podcast directory when a young person sent an email saying that they believed that the podcast itself was AI generated because they couldn’t conceive about that amount of work.”

The Automation Philosophy:

  • “Only take away the parts of the job that suck” - Preserve creative work, automate toil
  • Human-in-the-loop approach - Maintain editorial control and quality oversight
  • Real-world constraints - Small team (Scott + Mandy), minimal budget, maximum efficiency

Production Pain Points and Toil

Weekly Administrative Burden:

  • Show notes generation - “I’m notorious for saying I’m going to put that in the show notes and then you’ll never hear from me again”
  • Guest biography research - Manual Wikipedia searches and bio compilation
  • Link verification - Ensuring references are accurate and functional
  • Transcript generation - Converting audio to searchable text content
  • Scheduling coordination - Email back-and-forth with potential guests

Time Investment:

  • 2-3 hours weekly of administrative work for 20+ years
  • Goal reduction to 10-15 minutes with AI assistance
  • Scalability concern - Process must work for next 1,000 episodes over 20 years

?? 2. Azure AI Foundry: Three-Pillar Architecture

Platform Philosophy

Yina’s Definition: > “The open, flexible, and secure platform that enables you as a developer to infuse AI in every single application that you do. Whether it is a brand new application, or something that you’ve been working for a while.”

Three Core Pillars:

Azure AI Foundry
??? Models: Extensive catalog with intelligent selection
??? Agents: Agentic platform for orchestration and memory
??? Observability: Production monitoring and evaluation tools

Developer Journey:

  • Idea to code - Rapid prototyping and experimentation
  • Code to production - Enterprise-ready deployment and monitoring
  • Integrated signals loop - Continuous improvement through observability

?? 3. Models: The Foundation Layer

Model Explosion and Selection Challenge

Historical Context: > “Two years ago the world was actually very simple. We had just a few foundational models. But the range of change has been unprecedented over the last couple of years.”

Current Ecosystem:

  • 10,000+ models available through Hugging Face partnership
  • Daily updates with latest model releases and capabilities
  • Multiple dimensions - Provider, industry, capabilities, deployment type, task support

Live Demo: Model Catalog Navigation

Intelligent Discovery Features:

  • Built-in agent assistance - “Best model for protein research” queries
  • Leaderboard comparisons - Quality, safety, cost, throughput benchmarks
  • Trade-off visualization - Quality vs. cost, quality vs. throughput charts
  • Dynamic evaluation - Real benchmarks, not hard-coded rankings

New Model Providers:

  • OpenAI family - GPT-4.1, nano, mini models
  • DeepSeek, Mistral, Grok - Expanded language model options
  • Meta, Black Forest Lab - Specialized and research models
  • Foundry Labs - Microsoft Research model integration

Model Router: Intelligent Cost Optimization

Revolutionary Approach: > “Model-router takes the selection toil from your head… it is a router on top of like the best models.”

Live Demonstration Results:

Simple Query: "Where is Hanoi?" ? Nano model (cost-efficient)
Complex Planning: "Plan a trip with constraints" ? Mini model (balanced capability)
Complex Reasoning: Multi-step problems ? Full reasoning model

Performance Benefits:

  • 60% cost reduction compared to always using GPT-4.1
  • 1-2% accuracy reduction - Minimal quality impact
  • Environmental efficiency - Reduced computational resources
  • Automatic optimization - No manual model selection required

Fine-Tuning and Distillation Success

Hanselminutes Model Economics:

Original Processing Cost: $100 for 1,000 episodes
Fine-tuned Model Cost: $1.50 for next 1,000 episodes
Cost Reduction: 98.5% savings through specialization

Technical Achievement:

  • Whisper-based transcription for initial corpus processing
  • Distilled specialized model - Only does show notes, no general knowledge
  • Developer tier announcement - Reduced hosting fees for experimentation

?? 4. Foundry Local: Hybrid AI Architecture

Local AI Capabilities

On-Device Processing Demo:

  • Phi-4 mini model running locally on laptop GPU
  • Biographical research combining Perplexity (cloud) + local summarization
  • GPU memory monitoring - Real-time resource utilization display
  • Instant startup/shutdown - foundry service stop releases resources

Live Demo: Biography Generation

Hybrid Processing Workflow:

  1. Cloud research phase - Perplexity gathers comprehensive data
  2. Local processing - Phi-4 mini summarizes and formats on GPU
  3. Quality control - Immediate feedback on accuracy and relevance

Demonstration Results:

  • First attempt - Incorrect biographical details (wrong Yina)
  • Second attempt - Accurate professional summary and background
  • Real-time correction - Shows AI fallibility and recovery

?? 5. Agents: Beyond Tools to Orchestrated Intelligence

Defining the Agent Paradigm

Scott’s Practical Definition: > “AI’s don’t have feet and they don’t have hands and they can’t do stuff… I feel like an agent might be able to call multiple tools, but an agent has a task where it’s like, take this and do it and then let me know when you’re finished.”

Yina’s Technical Framework: > “The difference is now you’re having an LLM that is helping you drive the control flow of the program. It’s making some of those decisions for you.”

Agent Capabilities:

  • Tool orchestration - Coordinated use of multiple capabilities
  • Memory management - Context preservation across interactions
  • Multi-modal inputs - Text, speech, images, video processing
  • Agent-to-agent communication - Collaborative task execution

Agent Service: General Availability Announcement

Enterprise-Ready Platform Features:

  • Declarative agent creation - Configuration-based development
  • Cloud execution - Managed scaling and resource allocation
  • Enterprise integration - Custom file storage, network isolation
  • Authentication support - API token management and security
  • Built-in connectors - Fabric, SharePoint, Bing, Azure AI Search

Interoperability Standards:

  • A2A (Agent-to-Agent) - Cross-platform agent communication
  • MCP (Model Context Protocol) - Standardized tool integration
  • LangChain and CrewAI - Popular framework compatibility
  • OpenAI APIs - Assistants and Response API support

Live Demo: Specialized Agent Creation

Show Notes Agent Implementation:

  • Fine-tuned nano model - Specialized for podcast show note generation
  • Cost optimization - Smallest possible model for specific task
  • Quality preservation - Maintains accuracy while reducing resource usage
  • Visual Studio integration - One-click export to development environment

Agent Catalog Ecosystem:

  • Microsoft Build templates - Ready-to-use agent configurations
  • Partner contributions - Third-party specialized agents
  • Stanford healthcare example - Academic research integration

??? 6. Multi-Agent Orchestration: The Podcast Factory

Voice Orchestrator Agent

Natural Language Interface: > “Hey, are you there and can you give me a list of the things that you can do to help me with the podcast production?”

Demonstrated Capabilities:

  • Guest intake collaboration - Brainstorming and outreach strategies
  • Bio generation - Automated speaker biography creation
  • Transcript processing - Audio-to-text conversion and formatting
  • Show notes generation - Structured content creation
  • Link verification - URL validation and accuracy checking

Advanced Search and Knowledge Retrieval

Agentic Retrieval Innovation: > “This is not just like a one shot… this is what we call agentic retrieval on Azure AI Search.”

Complex Query Processing:

Query: "Over the last 20 years are there any recurring ideas, 
or universal truths in the last 1,000 episodes that have emerged?"

Processing:
??? Query decomposition into multiple sub-questions
??? Parallel information retrieval across corpus
??? Synthesis of findings across temporal patterns
??? Thematic analysis: Pragmatism, mentorship, community

Content Understanding and Schema Definition

Structured Data Extraction:

  • Speaker identification - Multiple voice recognition and separation
  • Topic extraction - Thematic categorization of episode content
  • Quote identification - Notable statements and insights
  • Title and summary generation - Automated metadata creation

Technical Implementation:

  • Azure AI Content Understanding service integration
  • Custom schema definition for podcast-specific fields
  • Multi-speaker detection - Including advertisement voice differentiation

?? 7. Semantic Kernel and AutoGen Integration

Framework Convergence Announcement

Strategic Platform Unification: > “AutoGen has been coming in as an agentic framework from our Microsoft research organization. Semantic Kernel is the one that we’ve been recommending for using in production… we’re bringing them together into like one agentic framework.”

Technical Implementation:

  • Shared runtime - Common execution environment
  • Production focus - Semantic Kernel for enterprise deployment
  • Research innovation - AutoGen experimental capabilities
  • Framework convergence - Single unified platform roadmap

Live Code Demonstration

Multi-Agent Workflow Definition:

// Semantic Kernel function integration
var contentAgent = kernel.CreateFunction("ProcessShowNotes");
var linkVerifier = kernel.CreateFunction("VerifyLinks");
var summarizer = kernel.CreateFunction("CreateSummary");

// Workflow orchestration
var workflow = new SequentialWorkflow()
    .AddStep(transcriptAnalysis)
    .AddStep(contentGeneration) 
    .AddStep(linkVerification)
    .AddStep(summaryGeneration);

Visual Workflow Management:

  • Mermaid chart generation - Visual workflow representation
  • YAML abstraction - Configuration without direct YAML editing
  • Process visualization - Clear understanding of agent interactions

?? 8. Observability: Production-Ready AI Applications

Comprehensive Monitoring Stack

Development Lifecycle Support: > “We have this set of tools to support your entire development lifecycle. Whether you are like starting to experiment, or like thinking about going to production.”

OpenTelemetry Integration: Scott’s Context: > “OTel is one of the biggest things… it’s effectively distributed log files and when you have large complicated distributed systems like this, you want all that to come to one place.”

Monitoring Capabilities:

  • Distributed tracing - End-to-end request flow visibility
  • Azure Monitor integration - Enterprise monitoring platform compatibility
  • Custom dashboards - Grafana and existing observability tools
  • Real-time metrics - Performance and cost optimization insights

AI-Specific Evaluation Framework

Agent-Centric Observability:

  • Conversation threads - Complete dialogue context and memory
  • Agent runs - Individual interaction turn analysis
  • Input/output tracking - Complete data flow monitoring
  • Metadata collection - Decision-making process transparency

Built-in Evaluation Metrics:

  • Intent resolution - Task completion accuracy assessment
  • Relevance scoring - Response quality and context alignment
  • Code vulnerability detection - Security risk identification
  • Task adherence - Agent focus and scope compliance
  • Indirect jailbreak protection - Security against malicious input

CI/CD Pipeline Integration

DevOps Automation:

  • GitHub Actions integration - Automated evaluation in deployment pipeline
  • Quality gates - Automated testing before production release
  • Performance tracking - Historical quality and performance metrics
  • Regression detection - Automatic identification of capability degradation

?? 9. Enterprise Security and Governance

End-to-End Security Architecture

Identity and Access Management:

  • Entra ID integration - Agents as first-class directory entities
  • Specific agent identities - Individual permissions and governance
  • Principle of least privilege - Minimal required access rights
  • Audit trail - Complete action and decision logging

Data Protection and Compliance

Microsoft Purview Integration:

  • Data labeling - Automatic content classification
  • Data protection - Encryption and access control
  • Compliance monitoring - Regulatory requirement adherence
  • Microsoft Defender - Security threat detection and response

Grounding and Restriction Controls

Business Context Enforcement: > “It can’t be overstated how important grounding is… I don’t want my chatbot to talk about things that aren’t this… You don’t want your coffee shop chatbot offering relationship advice.”

Implementation Strategies:

  • Domain-specific models - Trained only for specific business contexts
  • Knowledge source restriction - Limited to approved information sources
  • Custom search scopes - Controlled web access and information retrieval
  • Response filtering - Output validation and appropriateness checking

Live Demonstration Results

Podcast Factory End-to-End Processing

Real-Time Session Recording:

  • Complete workflow execution - From raw recording to finished show notes
  • Multi-agent coordination - Seamless handoff between specialized agents
  • Quality validation - Human oversight at critical decision points
  • Production-ready output - Immediately usable show notes and transcripts

Generated Content Quality

Show Notes Output:

# Notable Quotes
- "Imposture syndrome means you're doing something okay"
- "Thoughts are not facts" (attributed to guest)

# Key Takeaways  
- Model router provides 60% cost savings
- Grounding essential for business-focused AI
- Human-in-the-loop maintains quality control

# Resources Mentioned
- BRK155 Build Session
- Azure AI Foundry (AI.Azure.com)
- Scott's YouTube Channel [MISSING LINK - FLAGGED]
- Hanselminutes Episode 881 with Raji

Technical Performance Metrics

Cost Optimization Results:

  • Model router efficiency - 60% cost reduction with 1-2% accuracy loss
  • Fine-tuning economics - 98.5% cost reduction for specialized tasks
  • Local processing - Zero cloud costs for summarization tasks
  • Resource efficiency - Intelligent model selection based on query complexity

Agent Orchestration Success

Multi-Agent Workflow:

  1. Transcript generation - Whisper-based audio processing
  2. Content analysis - Topic extraction and speaker identification
  3. Show notes creation - Structured content generation
  4. Link verification - URL validation and accuracy checking
  5. Summary generation - Final output compilation

Session Highlights

“I feel very strongly about this work and as a human who created some art, I want to make sure that I’m going to be using AI in a way that only takes away the parts of the job that suck.” - Scott Hanselman

“We took this process that Scott has been doing for 20 years… to erase that toil at scale.” - Yina Arenas

“Model-router takes the selection toil from your head… you can get up to 60% price, like, off in the price.” - Yina Arenas

“The difference is now you’re having an LLM that is helping you drive the control flow of the program. It’s making some of those decisions for you.” - Yina Arenas

“What factory are you going to build? How are you going to like bring that investment to be, instead of return of investment, return on your effort?” - Yina Arenas


Implementation Guide

Getting Started with Azure AI Foundry

1. Model Selection and Optimization Strategy

**Model Catalog Navigation:**

- Use built-in agent for domain-specific model recommendations
- Leverage leaderboards for cost/quality/throughput optimization
- Implement model router for automatic cost optimization
- Consider fine-tuning for specialized use cases

**Cost Optimization Approach:**

- Start with model router for automatic selection
- Measure baseline costs and performance
- Identify opportunities for fine-tuning
- Implement distilled models for specific tasks

2. Agent Development Workflow

// Agent Service Implementation Pattern
var agent = await foundry.CreateAgentAsync(new AgentConfig
{
    Name = "ShowNotesAgent",
    Model = "gpt-4.1-nano-finetuned", 
    Tools = { "content-understanding", "link-verification" },
    Knowledge = { "podcast-corpus", "guest-database" },
    Instructions = "Generate structured show notes with quotes and links"
});

// Multi-agent orchestration with Semantic Kernel
var workflow = kernel.CreateWorkflow()
    .AddAgent(transcriptAgent)
    .AddAgent(contentAgent) 
    .AddAgent(linkAgent)
    .AddAgent(summaryAgent);

3. Observability and Monitoring Setup

**Production Monitoring:**

- Enable OpenTelemetry tracing across agent workflows
- Configure Azure Monitor integration for dashboards
- Set up automated evaluation pipelines in CI/CD
- Implement quality gates and regression detection

**Evaluation Framework:**

- Define custom evaluation metrics for business context
- Implement A/B testing for agent improvements
- Monitor cost and performance trends over time
- Set up alerting for quality degradation

Best Practices for Production Deployment

Agent Design Principles

  • Single responsibility - Create focused agents for specific tasks
  • Composable workflows - Design agents to work together seamlessly
  • Human-in-the-loop - Maintain oversight at critical decision points
  • Fail gracefully - Implement fallback strategies and error handling

Security and Compliance

  • Principle of least privilege - Grant minimal required permissions
  • Data classification - Use Purview for automatic content labeling
  • Access control - Implement Entra ID integration for agent identities
  • Audit trails - Maintain complete logs of agent actions and decisions

Cost Management

  • Model right-sizing - Use smallest model capable of required quality
  • Router implementation - Automatic model selection based on query complexity
  • Local processing - Leverage on-premises resources for appropriate tasks
  • Usage monitoring - Track costs and optimize based on actual usage patterns

Advanced Applications

Content Creation Automation

Podcast Production Pipeline:

  • Guest sourcing - AI-powered guest recommendation and outreach
  • Interview preparation - Automated research and question generation
  • Content processing - Transcription, summarization, and metadata extraction
  • Publishing workflow - Social media content and promotional material generation

Enterprise Knowledge Management

Large Corpus Processing:

  • Historical content analysis - Pattern recognition across years of material
  • Semantic search - Natural language queries across comprehensive datasets
  • Content synthesis - Multi-source information combination and analysis
  • Trend identification - Emerging themes and topic evolution tracking

Multi-Modal AI Applications

Comprehensive Content Understanding:

  • Audio processing - Speech-to-text with speaker identification
  • Visual content - Image and video analysis integration
  • Document processing - Structured data extraction from various formats
  • Cross-modal synthesis - Combined analysis of multiple content types

Business Impact and ROI

Productivity Transformation

Hanselminutes Case Study Results:

  • Time savings - From 2-3 hours to 10-15 minutes weekly
  • Cost efficiency - 98.5% reduction in processing costs through fine-tuning
  • Quality maintenance - Human oversight preserves editorial standards
  • Scalability - Process designed for next 1,000 episodes over 20 years

Enterprise Adoption Metrics

Platform Usage Statistics:

  • 70,000+ customers using Azure AI Foundry for organizational AI
  • 10,000+ users of Azure AI Foundry Agent Service
  • Millions of agents created for business process automation
  • Production deployments across Gainsight, Saifr, Heineken, and more

Resources and Further Learning

Official Platform Access

Technical Documentation

About the Speakers

Yina Arenas
VP Azure AI Foundry
Microsoft
Leads Microsoft’s strategy for AI model ecosystems, enterprise AI agents, and AI developer experience. Platform builder whose innovations reach nearly a billion users, redefining how businesses scale with AI.

Scott Hanselman
Vice President, Developer Community
Microsoft
Developer for 30+ years, blogger for 20 years at hanselman.com. Nearly 1,000 episodes of Hanselminutes podcast and 750+ episodes of Azure Friday. Author and speaker to over one million developers worldwide.


This flagship session demonstrates Azure AI Foundry as Microsoft’s comprehensive platform for building production-ready AI applications, combining extensive model catalogs, sophisticated agent orchestration, and enterprise-grade observability into a unified development experience that transforms real-world business processes through intelligent automation.